Proposal on Handling Reph in Gurmukhi and Telugu Scripts

نویسنده

  • Nagarjuna Venna
چکیده

Chapter 9 of the Unicode standard [1] describes the representational model for encoding Indic scripts. Devanagari is described in Section 9.1; the principles of Indic scripts are covered in some detail in the introduction to Devanagari. The descriptions of the remaining Indic scripts were abbreviated highlighting any di erences from Devanagari where appropriate. Some of the problems in this description were clari ed by Public Review Issue #37 [2] which focused on consistent handling of Zero Width Joiner (ZWJ) in Indic scripts. That proposal put forth a set of rules for handling ZWJ and ZWNJ that are applicable across all Indic scripts. The formation of Reph is de ned in Section 9.1, Rules for Rendering, R2 of [1]. Reph is de ned as a nonspacing combining mark glyph form of U+0930 DEVANAGARI LETTER RA positioned above or attached to the upper part of a base glyph form. Basically, Reph is formed when a RA which has the inherent vowel killed by the virama begins a syllable. Not all scripts have Reph; if the script in question has a Reph form, the sequence is rendered with Reph on C. Also, for Devanagari, the sequence is always rendered as eyelash-RA instead of Reph. Devanagari, Bengali, Gujarati, Oriya, and Kannada are listed in [1] and [2] as scripts that have a Reph form. Gurmukhi, Tamil, and Telugu are listed as scripts that do not have a Reph form. Malayalam is described as a script that has Reph in the traditional orthography but not in modern usage. However, both Telugu and Gurmukhi are similar to Malayalam in that Reph was used in ancient texts, but is not used in contemporary writings. While several scripts consistently use Reph (both in modern and historic usage), Gurmukhi, Malayalam, and Telugu have variable usage with respect to Reph and there are special scenarios where users may need to display Reph, typically for reproducing old documents. Unfortunately, there is no mechanism in the standard for users to indicate to a renderer that Reph should be displayed, if possible, in one of these scripts. The intent of this proposal is to specify an encoding mechanism that allows users of Gurmukhi and Telugu to indicate that Reph should be displayed by

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on optical character recognition for Bangla and Devanagari scripts

Abstract. The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on ...

متن کامل

A survey on feature extraction and classification techniques for character recognition of Indian scripts

Much research has been done by many researchers on Optical Character Recognition system. But most of the work done is on Greek, Chinese, English and Japanese characters. There has not been adequate work on character recognition of Indian languages like Bangla, Marathi, Malayalam, Telugu, Gujarati, Kannada, Gurmukhi and Oriya. The development of handwritten character recognition (HCR) is an inte...

متن کامل

Conversion between Scripts of Punjabi: Beyond Simple Transliteration

This paper describes statistical techniques used for modelling transliteration systems between the scripts of Punjabi language. Punjabi is one of the unique languages, which are written in more than one script. In India, Punjabi is written in Gurmukhi script, while in Pakistan it is written in Shahmukhi (Perso-Arabic) script. Shahmukhi script has its origin in the ancient Phoenician script wher...

متن کامل

Feature Extraction and Classification Techniques in O.C.R. Systems for Handwritten Gurmukhi Script – A Survey

Optical character recognition (OCR) is very popular research field since 1950’s. A great work has been done for various scripts particularly in case of English. But in case of Indian scripts the research is limited. This paper presents an overview of the various O.C.R. systems for gurmukhi which are developed for handwritten isolated gurmukhi text. In case of printed gurmukhi text a lot of rese...

متن کامل

A Complete Machine printed Gurmukhi OCR System

Recognition of Indian language scripts is a challenging problem. Work for the development of complete OCR systems for Indian language scripts is still in infancy. Complete OCR systems have recently been developed for Devanagri and Bangla scripts. Research in the field of recognition of Gurmukhi script faces major problems mainly related to the unique characteristics of the script like connectiv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006